AITopics | model extrapolation

Collaborating Authors

model extrapolation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

EpiCoDe: Boosting Model Performance Beyond Training with Extrapolation and Contrastive Decoding

Tao, Mingxu, Hu, Jie, Yang, Mingchuan, Liu, Yunhuai, Zhao, Dongyan, Feng, Yansong

arXiv.org Artificial IntelligenceJun-5-2025

The remarkable performance of Large language models (LLMs) relies heavily on the availability of abundant high-quality training data. However, the high cost of acquiring annotated data often prevents models from obtaining capabilities to tackle downstream tasks. In this paper, we introduce a novel method, EpiCoDe that boosts model performance in data-scarcity scenarios without extra training. We first employ model extrapolation to enhance a finetuned model with its inferior version, and then adopt contrastive decoding to further reduce predicted errors, by comparing the logit scores given by the extrapolated and the vanilla finetuned model. Experiments across three tasks over four different LLMs show that EpiCoDe consistently outperforms existing methods with significant and robust improvement. We also propose a new theoretical framework to reveal the mechanism behind contrastive decoding in data-scarcity scenarios, which further helps us better understand the effectiveness of EpiCoDe.

large language model, machine learning, model extrapolation, (20 more...)

arXiv.org Artificial Intelligence

2506.03489

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Extrapolation Merging: Keep Improving With Extrapolation and Merging

Lin, Yiguan, Xu, Bin, Li, Yinghao, Gao, Yang

arXiv.org Artificial IntelligenceMar-5-2025

Large Language Models (LLMs) require instruction fine-tuning to perform different downstream tasks. However, the instruction fine-tuning phase still demands significant computational resources and labeled data, lacking a paradigm that can improve model performance without additional computational power and data. Model merging aims to enhance performance by combining the parameters of different models, but the lack of a clear optimization direction during the merging process does not always guarantee improved performance. In this paper, we attempt to provide a clear optimization direction for model merging. We first validate the effectiveness of the model extrapolation method during the instruction fine-tuning phase. Then, we propose Extrapolation Merging, a paradigm that can continue improving model performance without requiring extra computational resources or data. Using the extrapolation method, we provide a clear direction for model merging, achieving local optimization search, and consequently enhancing the merged model's performance. We conduct experiments on seven different tasks, and the results show that our method can consistently improve the model's performance after fine-tuning.

base model, extrapolation, sft model, (16 more...)

arXiv.org Artificial Intelligence

2503.04834

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
Asia > China > Beijing > Beijing (0.04)
(3 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

On Model Extrapolation in Marginal Shapley Values

Rozenfeld, Ilya

arXiv.org Machine LearningDec-17-2024

As the use of complex machine learning models continues to grow, so does the need for reliable explainability methods. One of the most popular methods for model explainability is based on Shapley values. There are two most commonly used approaches to calculating Shapley values which produce different results when features are correlated, conditional and marginal. In our previous work, it was demonstrated that the conditional approach is fundamentally flawed due to implicit assumptions of causality. However, it is a well-known fact that marginal approach to calculating Shapley values leads to model extrapolation where it might not be well defined. In this paper we explore the impacts of model extrapolation on Shapley values in the case of a simple linear spline model. Furthermore, we propose an approach which while using marginal averaging avoids model extrapolation and with addition of causal information replicates causal Shapley values. Finally, we demonstrate our method on the real data example.

artificial intelligence, machine learning, shapley value, (17 more...)

arXiv.org Machine Learning

2412.13158

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Marginal Effects for Non-Linear Prediction Functions

Scholbeck, Christian A., Casalicchio, Giuseppe, Molnar, Christoph, Bischl, Bernd, Heumann, Christian

arXiv.org Machine LearningJan-21-2022

Beta coefficients for linear regression models represent the ideal form of an interpretable feature effect. However, for non-linear models and especially generalized linear models, the estimated coefficients cannot be interpreted as a direct feature effect on the predicted outcome. Hence, marginal effects are typically used as approximations for feature effects, either in the shape of derivatives of the prediction function or forward differences in prediction due to a change in a feature value. While marginal effects are commonly used in many scientific fields, they have not yet been adopted as a model-agnostic interpretation method for machine learning models. This may stem from their inflexibility as a univariate feature effect and their inability to deal with the non-linearities found in black box models. We introduce a new class of marginal effects termed forward marginal effects. We argue to abandon derivatives in favor of better-interpretable forward differences. Furthermore, we generalize marginal effects based on forward differences to multivariate changes in feature values. To account for the non-linearity of prediction functions, we introduce a non-linearity measure for marginal effects. We argue against summarizing feature effects of a non-linear prediction function in a single metric such as the average marginal effect. Instead, we propose to partition the feature space to compute conditional average marginal effects on feature subspaces, which serve as conditional feature effect estimates.

feature space, feature value, prediction function, (13 more...)

arXiv.org Machine Learning

2201.08837

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Marche > Ancona Province > Ancona (0.04)
(3 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.86)

Add feedback